58 research outputs found

    It's Time to Consider "Time" when Evaluating Recommender-System Algorithms [Proposal]

    Full text link
    In this position paper, we question the current practice of calculating evaluation metrics for recommender systems as single numbers (e.g. precision p=.28 or mean absolute error MAE = 1.21). We argue that single numbers express only average effectiveness over a usually rather long period (e.g. a year or even longer), which provides only a vague and static view of the data. We propose that recommender-system researchers should instead calculate metrics for time-series such as weeks or months, and plot the results in e.g. a line chart. This way, results show how algorithms' effectiveness develops over time, and hence the results allow drawing more meaningful conclusions about how an algorithm will perform in the future. In this paper, we explain our reasoning, provide an example to illustrate our reasoning and present suggestions for what the community should do next

    An Empirical Comparison of Syllabuses for Curriculum Learning

    Full text link
    Syllabuses for curriculum learning have been developed on an ad-hoc, per task basis and little is known about the relative performance of different syllabuses. We identify a number of syllabuses used in the literature. We compare the identified syllabuses based on their effect on the speed of learning and generalization ability of a LSTM network on three sequential learning tasks. We find that the choice of syllabus has limited effect on the generalization ability of a trained network. In terms of speed of learning our results demonstrate that the best syllabus is task dependent but that a recently proposed automated curriculum learning approach - Predictive Gain, performs very competitively against all identified hand-crafted syllabuses. The best performing hand-crafted syllabus which we term Look Back and Forward combines a syllabus which steps through tasks in the order of their difficulty with a uniform distribution over all tasks. Our experimental results provide an empirical basis for the choice of syllabus on a new problem that could benefit from curriculum learning. Additionally, insights derived from our results shed light on how to successfully design new syllabuses

    Document Embeddings vs. Keyphrases vs. Terms: An Online Evaluation in Digital Library Recommender Systems

    Full text link
    Many recommendation algorithms are available to digital library recommender system operators. The effectiveness of algorithms is largely unreported by way of online evaluation. We compare a standard term-based recommendation approach to two promising approaches for related-article recommendation in digital libraries: document embeddings, and keyphrases. We evaluate the consistency of their performance across multiple scenarios. Through our recommender-as-a-service Mr. DLib, we delivered 33.5M recommendations to users of Sowiport and Jabref over the course of 19 months, from March 2017 to October 2018. The effectiveness of the algorithms differs significantly between Sowiport and Jabref (Wilcoxon rank-sum test; p < 0.05). There is a ~400% difference in effectiveness between the best and worst algorithm in both scenarios separately. The best performing algorithm in Sowiport (terms) is the worst performing in Jabref. The best performing algorithm in Jabref (keyphrases) is 70% worse in Sowiport, than Sowiport`s best algorithm (click-through rate; 0.1% terms, 0.03% keyphrases)

    Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned

    Full text link
    For the past few years, we used Apache Lucene as recommendation frame-work in our scholarly-literature recommender system of the reference-management software Docear. In this paper, we share three lessons learned from our work with Lucene. First, recommendations with relevance scores below 0.025 tend to have significantly lower click-through rates than recommendations with relevance scores above 0.025. Second, by picking ten recommendations randomly from Lucene's top50 search results, click-through rate decreased by 15%, compared to recommending the top10 results. Third, the number of returned search results tend to predict how high click-through rates will be: when Lucene returns less than 1,000 search results, click-through rates tend to be around half as high as if 1,000+ results are returned.Comment: Accepted for publication at the 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR2017

    Real-World Recommender Systems for Academia: The Pain and Gain in Building, Operating, and Researching them [Long Version]

    Full text link
    Research on recommender systems is a challenging task, as is building and operating such systems. Major challenges include non-reproducible research results, dealing with noisy data, and answering many questions such as how many recommendations to display, how often, and, of course, how to generate recommendations most effectively. In the past six years, we built three research-article recommender systems for digital libraries and reference managers, and conducted research on these systems. In this paper, we share some experiences we made during that time. Among others, we discuss the required skills to build recommender systems, and why the literature provides little help in identifying promising recommendation approaches. We explain the challenge in creating a randomization engine to run A/B tests, and how low data quality impacts the calculation of bibliometrics. We further discuss why several of our experiments delivered disappointing results, and provide statistics on how many researchers showed interest in our recommendation dataset.Comment: This article is a long version of the article published in the Proceedings of the 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR

    Implementing Neural Turing Machines

    Full text link
    Neural Turing Machines (NTMs) are an instance of Memory Augmented Neural Networks, a new class of recurrent neural networks which decouple computation from memory by introducing an external memory unit. NTMs have demonstrated superior performance over Long Short-Term Memory Cells in several sequence learning tasks. A number of open source implementations of NTMs exist but are unstable during training and/or fail to replicate the reported performance of NTMs. This paper presents the details of our successful implementation of a NTM. Our implementation learns to solve three sequential learning tasks from the original NTM paper. We find that the choice of memory contents initialization scheme is crucial in successfully implementing a NTM. Networks with memory contents initialized to small constant values converge on average 2 times faster than the next best memory contents initialization scheme

    Towards Effective Research-Paper Recommender Systems and User Modeling based on Mind Maps

    Full text link
    While user-modeling and recommender systems successfully utilize items like emails, news, and movies, they widely neglect mind-maps as a source for user modeling. We consider this a serious shortcoming since we assume user modeling based on mind maps to be equally effective as user modeling based on other items. Hence, millions of mind-mapping users could benefit from user-modeling applications such as recommender systems. The objective of this doctoral thesis is to develop an effective user-modeling approach based on mind maps. To achieve this objective, we integrate a recommender system in our mind-mapping and reference-management software Docear. The recommender system builds user models based on the mind maps, and recommends research papers based on the user models. As part of our research, we identify several variables relating to mind-map-based user modeling, and evaluate the variables' impact on user-modeling effectiveness with an offline evaluation, a user study, and an online evaluation based on 430,893 recommendations displayed to 4,700 users. We find, among others, that the number of analyzed nodes, modification time, visibility of nodes, relations between nodes, and number of children and siblings of a node affect the effectiveness of user modeling. When all variables are combined in a favorable way, this novel approach achieves click-through rates of 7.20%, which is nearly twice as effective as the best baseline. In addition, we show that user modeling based on mind maps performs about as well as user modeling based on other items, namely the research articles users downloaded or cited. Our findings let us to conclude that user modeling based on mind maps is a promising research field, and that developers of mind-mapping applications should integrate recommender systems into their applications. Such systems could create additional value for millions of mind-mapping users.Comment: PhD Thesis, Otto-von-Guericke University Magdeburg, German

    One-at-a-time: A Meta-Learning Recommender-System for Recommendation-Algorithm Selection on Micro Level

    Full text link
    The effectiveness of recommendation algorithms is typically assessed with evaluation metrics such as root mean square error, F1, or click through rates, calculated over entire datasets. The best algorithm is typically chosen based on these overall metrics. However, there is no single-best algorithm for all users, items, and contexts. Choosing a single algorithm based on overall evaluation results is not optimal. In this paper, we propose a meta-learning-based approach to recommendation, which aims to select the best algorithm for each user-item pair. We evaluate our approach using the MovieLens 100K and 1M datasets. Our approach (RMSE, 100K: 0.973; 1M: 0.908) did not outperform the single-best algorithm, SVD++ (RMSE, 100K: 0.942; 1M: 0.887). We also develop a distinction between meta-learners that operate per-instance (micro-level), per-data subset (mid-level), and per-dataset (global level). Our evaluation shows that a hypothetically perfect micro-level meta-learner would improve RMSE by 25.5% for the MovieLens 100K and 1M datasets, compared to the overall-best algorithms used

    Exploring Choice Overload in Related-Article Recommendations in Digital Libraries

    Full text link
    We investigate the problem of choice overload - the difficulty of making a decision when faced with many options - when displaying related-article recommendations in digital libraries. So far, research regarding to how many items should be displayed has mostly been done in the fields of media recommendations and search engines. We analyze the number of recommendations in current digital libraries. When browsing fullscreen with a laptop or desktop PC, all display a fixed number of recommendations. 72% display three, four, or five recommendations, none display more than ten. We provide results from an empirical evaluation conducted with GESIS' digital library Sowiport, with recommendations delivered by recommendations-as-a-service provider Mr. DLib. We use click-through rate as a measure of recommendation effectiveness based on 3.4 million delivered recommendations. Our results show lower click-through rates for higher numbers of recommendations and twice as many clicked recommendations when displaying ten instead of one related-articles. Our results indicate that users might quickly feel overloaded by choice.Comment: Accepted for publication at the 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR2017

    Meta-Learned Per-Instance Algorithm Selection in Scholarly Recommender Systems

    Full text link
    The effectiveness of recommender system algorithms varies in different real-world scenarios. It is difficult to choose a best algorithm for a scenario due to the quantity of algorithms available, and because of their varying performances. Furthermore, it is not possible to choose one single algorithm that will work optimally for all recommendation requests. We apply meta-learning to this problem of algorithm selection for scholarly article recommendation. We train a random forest, gradient boosting machine, and generalized linear model, to predict a best-algorithm from a pool of content similarity-based algorithms. We evaluate our approach on an offline dataset for scholarly article recommendation and attempt to predict the best algorithm per-instance. The best meta-learning model achieved an average increase in F1 of 88% when compared to the average F1 of all base-algorithms (F1; 0.0708 vs 0.0376) and was significantly able to correctly select each base-algorithm (Paired t-test; p < 0.1). The meta-learner had a 3% higher F1 when compared to the single-best base-algorithm (F1; 0.0739 vs 0.0717). We further perform an online evaluation of our approach, conducting an A/B test through our recommender-as-a-service platform Mr. DLib. We deliver 148K recommendations to users between January and March 2019. User engagement was significantly increased for recommendations generated using our meta-learning approach when compared to a random selection of algorithm (Click-through rate (CTR); 0.51% vs. 0.44%, Chi-Squared test; p < 0.1), however our approach did not produce a higher CTR than the best algorithm alone (CTR; MoreLikeThis (Title): 0.58%)
    • …
    corecore